-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make batched use ustringhash_pod and make batched compile in Windows #1831
base: main
Are you sure you want to change the base?
Conversation
I also tried to debug a bit why "closure-string" test fails with batched avx2. oslexec/wide/wide_opclosure.cpp: in void init_closure_component() casting ClosureComponent* there into MicrofacetParams* printed the same wrong hash for ustringhash as does process_bsdf_closure() in testminimal/oslmaterial.cpp. oslexec/batched_llvm_gen.cpp and its llvm_gen_closure function: it seems like a promising place to check since it processes all the variables inside microfacet() statement in test.osl, starting with distribution ustringhash. EDIT: This place has the same wrong hash, checked with debug cout: "llvm::outs() << 'loaded: ' << *loaded << '\n';", so the problem happens before that. |
51b795e
to
58bb38e
Compare
The problem was indeed in llvm_gen_closure function:
src/liboslexec/batched_backendllvm.cpp:
src/liboslexec/llvm_util.cpp:
This function already has code to select between ustring and ustringhash based on ustring representation, but So I added code in CMakeLists.txt to use already existing OSL_USTRINGREP_IS_HASH to conditionally select hash in src/include/OSL/llvm_util.h and then enabled support for hash for batched functions in src/liboslexec/llvm_util.cpp. This fixes "closure-string" test but introduces new failed tests because batched functions in liboslexec/wide/* still use ustring_pod instead of ustringhash_pod. I enabled ustringhash_pod for wide/wide_opmatrix.hpp as a start. |
2b413f5
to
67eb1cb
Compare
I enabled ustringhash_pod for all files in liboslexec/wide/. However, there's issues with the patch. All string parameters there are now ustringhash_pod and according to this error i64 is being inputted for strings, but its giving type mismatch, not sure why.
EDIT: It was fixed in src/liboslexec/builtindecl_wide_xmacro.h replacing X with s for string parameters. |
47c66e8
to
9f8b3d0
Compare
efa389f
to
a3e70cb
Compare
There's now more code to support ustringhash_pod for batched. Some string variables from .osl files get passed as ustrings into liboslexec/wide functions while ustringhash is expected, I didn't find where that is set. |
I think we left the batched version alone (not using hashes where possible) mostly to just limit the scope of the code changes with the original hashcode changes which were mostly targeting GPU work happening on the scalar (non-batched) code path. And of course there is some shared code which pays with some confusion on which parameters it's actually aking. I don't think there is problem with moving batched to ustringhash, the tests are comprehensive and will probably catch everything |
@@ -31,10 +31,12 @@ namespace pvt { | |||
// Shims to convert llvm gen to rs free function C++ parameter types | |||
// and forward on calls to re free functions. | |||
|
|||
// TODO this can be removed now that batched supports ustringhash_pod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this impl should be changed at all. Just people calling it maybe don't need to call it.
Then if there are no callers left, it can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this impl should be changed at all. Just people calling it maybe don't need to call it. Then if there are no callers left, it can be removed.
Yes. I changed it because I didn't want atm. to go through all the code calling it, and instead made the change here in a single place.
You might need to look at each failing call, most likely batched_llvm_gen.cpp. |
Just one comment on converting to hashcode vs. ustrings, for "real" string operations that are not constant folded (CPU only), there will be more overhead when we do the lookup of the hashcode to real string value to perform the string op
After:
Counter argument was if you had dynamic string operations, good chance that is a much bigger performance penalty than a couple lookups. |
Regarding moving ustrings to hashes -- the primary reason to do this is to have compiled GPU shaders hit in NVidia's compiler cache, thus it is currently of no benefit to the CPU. Because the non-simd CPU code shares a bunch of declarations for this, they ended up getting changed as well. For the specialized case of CPU SIMD code, in the absence of bugs, we've found in our renderer it makes much more sense to leave it as ustrings and avoid the expensive ustring_from transformations and have strings be easy to inspect in a debugger than to change to the slower and more difficult to use ustringhash implementation. Was this only to try and fix bugs you were seeing with hash values being passed in, or were there other reasons you had? Also, it would be much better to separate the minimal test tool out into a separate pull request from any major ustring<->ustringhash changes. |
My first reason was performance. For example, hashing due to calling transform("myspace") was taking around 1% of rendering time in batched mode only. EDIT2: It wasn't unnecessary hashing I was seeing in a profiler but unnecessary hash->ustring conversions, anyway all that I could find in a profiler and some by looking at code are now removed by this patch and were partly due to wide/ code expecting user interface functions to use ustring format but they already use ustringhash. All of OSL interface code for batched seems to already use hashes. Hashes are great to use in a switch {} from performance standpoint, where the expected cases will be compile time constants. This is much faster than doing string comparisons or map lookups with a string key. This is relevant at least in renderer implementations of get_matrices, get_attributes and when parsing closure tree and inside OSL too for example when comparing noise function names and transform spaces names, some of which could be changed to switches. Hashes might be slower when printing and doing some string operations, but is much of that being run when doing production rendering? EDIT2: Also, ustringhash_pod can be passed in CPU registers into functions which might give a small performance benefit. However, the reason I ended up changing it here was to fix this testcase so another reason to consider it is maintainability and code complexity. Having non-batched and batched use different string representation internally can lead to more complex and more error prone code.
Agreed, I can make those a separate PR. EDIT: Done. |
Here's an example: testsuite/array-reg/test_varying_index_string.osl:
indirectR gets passed as ustring into stoi() in batched mode. Here's some relevant debug output with OSL_DEV_ONLY and some of my debug prints:
What I'm looking for here is where in OSL codebase variable indirectR gets set to ustring instead of ustringhash.
|
Yes. I already commented this little from performance standpoint above, but additionally: This patch currently uses real ustrings for llvm_gen_printf() variadic arguments just because changing Strutil::vsprintf to support hashes was non-trivial. I'm not sure if this always works, however, unless all of those arguments are always are created in llvm_gen_printf(). But if some mixing can work, then it would be possible to use real ustrings for strings ops or for some string ops on CPU and hashes everywhere else. |
I did manage to narrow this one down. The problem is that string array constants are stored as char* pointers or ustrings and then passed as i64 that are actually pointers through everything into stoi(). In non-batched mode, constant allocation happens at llvm_create_constant(), there's even TODO there to use hashes instead. I didn't find where the same thing happens for batched mode, one way to a fix would be to use hashes instead there. Another way to a possible fix would be to do conversion to hash at BatchedBackendLLVM::llvm_store_value(), non-batched mode already does one conversion like that there. Not sure about what code would achieve that, non-batched mode conversion looked like pointer cast and applying it just seemed to brake the format which was already correct, i64. Another possibility is to do the conversion to hash at BatchedBackendLLVM::llvm_load_value() which gets called for that const symbol before llvm_store_value(), which also calls llvm_get_pointer() for that symbol. EDIT: Also, everything I'm using in OSL seems to already work with in batched mode with this patch so I'm no longer in a hurry to get this working. I also tried a smaller patch closurefix that only changes to use ustringhash in batched llvm_gen_closure() llvm::Value* loaded = rop.llvm_load_value() but that didn't work. |
82dfc10
to
2199e96
Compare
Signed-off-by: Tuomas Tonteri <[email protected]>
Signed-off-by: Tuomas Tonteri <[email protected]>
Signed-off-by: Tuomas Tonteri <[email protected]>
Signed-off-by: Tuomas Tonteri <[email protected]>
Signed-off-by: Tuomas Tonteri <[email protected]>
Now this patch also makes batched OSL compile in Windows with MSVC and Mingw and run at least with Mingw, attempting to fix #1645. There's overlap with make batched use ustringhash_pod patch so they are currently a combined patch. This part is not exactly ready since it disables functionality, but it does run and highlights the problem parts. It looks like code in liboslexec/wide depends on Linux way of loading symbols, where loaded shared lib can access symbols from loading shared lib, which somehow differs from how Windows handles this. Unless there's some kind of linker options that solves this, I think a solution would be to make oslexec.dll supply wide dlls with callback functions for e.g. error_fmt() and few other functions that are called this way. As is, this patch just comments out those calls from liboslexec/wide. |
I'm not sure I understand why this is failing CI. Can you rebase to current main and try pushing again to trigger a CI run? |
"Make batched use ustringhash_pod" patch works but is still missing string array constant support, that alone fails CI. I would really ask help from someone who knows the code better to fix that. See discussion above. "make batched compile in Windows" patch works but disables features which likely fail CI. It probably belongs in a separate patch that is worked on only after "make batched use ustringhash_pod" has been merged. |
What's the status here? |
Status unchanged. |
Description
EDIT:
Make batched and non-batched internally use the same string representation for reasons discussed below.
Tests
New test cases aren't necessarily required.
This PR would fix new failing test case in #1832.
Checklist:
already run clang-format v17 before submitting, I definitely will look at
the CI test that runs clang-format and fix anything that it highlights as
being nonconforming.